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Normalized Random Measures 



Abstract 

This paper presents theory for Normalized Random Measures (NRMs), 
Normalized Generalized Gammas (NGGs), a particular kind of NRM, and 
Dependent Hierarchical NRMs which allow networks of dependent NRMs to 
be analysed. These have been used, for instance, for time-dependent topic 
modelling. In this paper, we first introduce some mathematical background 
of completely random measures (CRMs) and their construction from Poisson 
processes, and then introduce NRMs and NGGs. Slice sampling is also intro- 
duced for posterior inference. The dependency operators in Poisson processes 
and for the corresponding CRMs and NRMs is then introduced and Posterior 
inference for the NGG presented. Finally, we give dependency and composi- 
tion results when applying these operators to NRMs so they can be used in a 
network with hierarchical and dependent relations. 

Keywords 

completely random measures; normalized randomized measures; normalized 
Generalized gamma process; dependent hierarchical normalized randomized 
measures; hierarchical models; 
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1 Introduction 

This paper presents theory for Normalized Random Measures (NRMs), Normalized 
Generalized Gammas (NGGs), a particular kind of NRM, and Dependent Hierar- 
chical NRMs which allow networks of dependent NRMs to be analysed. These have 
been used, for instance, for time-dependent topic modelling [CDB12] . 

Dependency models are getting more and more popular in machine learning 
recently due to the fact of correlated data we are facing at, e.g., real data is al- 
ways correlated with each other rather than independent. The pioneer work of 
MachEachern [Mac99l iMacOOj treats the jumps and atoms to be stochastic between 
dependent models. While there are many ways of constructing dependent nonpara- 
metric models, e.g., from a stick-breaking construction [GS09J, or from a hierarchical 
construction [TJBB06J, in this paper, following the idea of |LGF10j . we construct 
dependency normalized random measures from the underlying Poisson processes of 
the corresponding completely random measures [Kin67j . This construction is intu- 
itive and allow flexibly controlling of the dependencies. A related construction in 
the statistical literature is by Lijoi et al. [A. 12 ] that deals with modeling two groups 
of data. 

In this paper, we first introduce in Section [5] some mathematical background of 
completely random measures (CRMs) and their construction from Poisson processes, 
and then introduce NRMs and NGGs. Slice sampling is also introduced to do 
the posterior sampling of NRMs using techniques from [GWllJ. The dependency 
operators in Poisson processes and for the corresponding CRMs and NRMs is then 
introduced in Section [3] following the work of |Kin93[ ILGFlOj . Posterior inference 
for the NGG are then developed in Section H] based on the results of [JLP09j . Then 
we give the dependency and composition results when applying these operators to 
NRMs in Section [5j Proofs are given in the Appendix, Section |A] 

2 Background 

In this section we briefly introduce background of Poisson processes, the correspond- 
ing completely random measures, dependency operations on these random measures, 
and normalized random measures. 

Section |2~T1 explains how to construct completely random measures from Poisson 
processes. Section 13.11 introduces operations on Poisson processes to construct de- 
pendent Poisson processes. Section [3721 adapts these operations to the corresponding 
completely random measures (CRMs). Constructing normalized random measures 
(NRMs) from CRMs is discussed in Section I2~2l along with details of the Normalized 
Generalized Gamma (NGG), a particular kind of NRM for which the details have 
been worked out. A slice sampler for sampling an NRM is described in Section [2731 

We first give an illustration of the basic construction for an NRM. for a target 
domain X. The Poisson process is used to create a countable (and usually) infinite 
set points in a product space of M + with the target domain X, as shown in the left of 
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Figure [TJ The distribution is then a discrete one on these points. The distribution 
can be pictured by dropping lines from each point (t,x) down to (0,x), and then 
normalizing all these lines so their sum is one. The resulting picture shows the set 
of weighted impulses that make up the constructed NRM on the target domain. 



Counting process: 

= WO 



Completely random measure: 

£(•) = Efc 



Figure 1: Constructing a completely random measure from a counting process N(- 
with points at (Jk,Xk)- 



2.1 Constructing Completely Random Measures from Pois- 
son processes 



In contrast to the general class of completely random measure (CRM) |Kin67j . which 
admits a unique decomposition as the summation over there parts: a deterministic 
measure, a purely atomic measure with fixed atom locations and a measure^ with 
random jumps and atoms, in this paper, we restrict it to the class of pure jump 
processes [FK72] . which has the following form 



22 J k$x k , 



(1) 



k=l 



where J%, J 2 , • • • > are called the jumps of the process, and x±, x 2 , ■ • • are a sequence 
of independent random variables drawn from a base measurable space (X, £>(X) jE 

It is shown that these kinds of CRMs can be constructed from Poisson processes 
with specific mean measures z/(-). We will start from some definitions. 



Poisson Distributions: A random variable X taking values in N = {0, 1, 
is said to have the Poisson distribution with mean c in (0, oo) if 



p{X = k\c) 



e c c k 



oo} 
(2) 



^xan be continuous or discrete. 

2 £>(X) means the cr-algebra of X, we sometimes omit this and use X to denote the measurable 
space. 
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then X < oo almost surely and ELY] = Var[X] = c. 

Poisson Processes: Let (S, S) be a measure space where S is the cx-algebra of 8. 
Let z/(-) be a measure on it. A Poisson process on § is defined to be a random 
subset II G § such that if N(A) is the number of points of II in the measurable 
subset AC§, then 

a) N(A) is a random variable having the Poisson distribution with mean v(A), 



b) whenever A%, ■ • ■ ,A n are in S and disjoint, the random variables 
N(Ax), • • ■ , N(A n ) are independent. 

The integer-value random measure N(-) is called a Poisson random measure and the 
Poisson process is denoted as II ~ PoissonP(zz), where v is called the mean measure 
of the Poisson process. 

Completely Random Measure: In this paper, we define a random measure on 
(X, i3(X)) to be a linear functional of the Poisson random measure N(-), whose mean 
measure u(dt, dx) defined on a product space § = R + x X: 



The mean measure u(dt, dx) is called the Levy measure of /z. 

The general treatment of constructing random measures from Poisson random mea- 
sures can be found in |Jam05j . Note that the random measure ft in construction 
(E]) has the same form as Equation ([I]) because N(-) is composed of a countable 
number of points. It can be proven to be a completely random measure |Kin67j on 
X, meaning that for arbitrary disjoint subsets {Ai G X} of the measurable space, 
the random variables {fi(Ai)} are independent. 

For the completely random measure defined above to always be finite, it is nec- 
essary that J R+xX t u(dt, dx) be finite, and therefore for every z > 0, v([z, oo) x X) = 
Iz° In dx) is finite |Kin93j . It follows that there will always be a finite number 
of points with jumps Jk > z for that z > 0. Therefore in the bounded product space 
[z, oo) x X the measure u(dt, dx) is finite. So it is meaningful to sample those points 
{Jk, %k) with Jk > z by first getting the count of points K sampled from a Poisson 
with (finite) mean v([z, oo) x X), and then to sample the K points according to the 
distribution of ^ dt '^L • 

Without loss of generality, the Levy measure of Equation ([3]) can be represented 
as z/(dt, dx) = Mp v (dt\x)H(dx), where rj denotes the hyper-parameters if any of a 
measure on t, H(dx) is a probability measure so if (X) = 1, and M is called the 
mass of the Levy measure. Note the total measure of /^(dtlx) is not standardized 
in any way so in principle some mass could also appear in p v (dt\x). The mass is 
used as a concentration parameter for the random measure. 



and 




(3) 



Chen & Buntine & Ding 



7 



A realization of jl on X can be constructed by sampling from the underlying 
Poisson process in a number of ways, either in rounds for decreasing bounds z using 
the logic just given, or by explicitly sampling the jumps in order. The later goes as 
follows [FK72] : 

Lemma 1 (Sampling a CRM) Sample a CRM jl with Levy measure v(dt, dx) = 
Mp r] (dt\x)H(dx) as follows. 

• Draw i.i.d. samples Xi from the base measure H(dx). 

• Draw the corresponding weights Ji for these i.i.d. samples in decreasing order, 
which goes as: 

— Draw the largest jump Ji from the cumulative distribution function 
P(Ji < Ji) = exp {-Mf£p v (dt\xi)}. 

— Draw the second largest jump J 2 from the cumulative distribution function 
P(J 2 < 32) = exp j— M jV 1 p v (dt\x 2 )}. 

• The random measure ft then can now be realized as jl = Ji&xi ■ 

As a random variable is uniquely determined by its Laplace transformation, the 
random measure jl is uniquely characterized by its Laplace functional through the 
Levy-Khintchine representation of a Levy process |C10| . That is, for any measurable 
function / : X — > M + , we have 

E 



exp < - / f(x)p(dx) 



exp - / [1 - exp {-tf(x)}} u(dt, dx) (4) 



Now instead of dealing with jl itself, we deal with u(dt, dx), which is called the 
Levy measure of jl, whose role in generating the measure via a Poisson process was 
explained above. 

In the case where the measure on the jumps is not dependent on the data x, so 
p v (dt\x) = p v (dt), then jl is called homogeneous, which is the case considered in this 
paper. When / does not depend on x, OH) simplifies to 



E[exp{-//i(B)}]=ex P j-Mp(B) [1 - exp {-tf}] p v (dt)\ . 



(5) 



Note the term inside the exponential plays an important role in subsequent theory, 
so it is given a name. 

Laplace exponent: The Laplace exponent, denoted as ip v {f) f° r a CRM with pa- 
rameters T) is given by 

Mf) = [ [1- exp {-*/}] u(dt,dx) 

— Ml [1 — exp {— tf}] pr){dt) (homogeneous case) . (6) 

Jr+ 
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Note that to guarantee the positiveness of jumps in the random measure, p(dt) in 

PCX 

10 



the Levy measure should satisfy J °° p v (dt) = +00 |C10| , which leads to the following 



equations: 

^(0) = 0, ^(+00) = +00. (7) 

That ipn(f) is finite for finite positive / implies (or is a consequence of) J °° tp v (dt) 
being finite. 

Remark There are thus four different ways to define or interpret a CRM: 

1. via the linear functional of Equation ([3]), 

2. through the Levy-Khintchine representation of Equation (j3J) using the Laplace 
exponent, 

3. sampling in order of decreasing jumps using Lemma [TJ and 

4. sampling in blocks of decreasing jump values as discussed before Lemma [TJ 

2.2 Normalized random measures 

Normalized Random Measures (NRM) Based on ([3]), a normalized random 
measure on (X,B(X)) is defined af| 

P> /ON 

P 



p(xy 

The original idea of constructing random probabilities by normalizing com- 
pletely random measures on M, namely increasing additive processes, can be found 
in |ELP03j . where it is termed normalized random measures with independent in- 
crement (NRMI) and the existence of such random measures is proved. This idea 
can be easily generalized from M to any parameter space X, e.g., X being the Dirich- 
let distribution space in topic modeling. Also note that the idea of normalized 
random measures can be taken as doing a transformation Tr(-) on completely ran- 
dom measures, that is p = Tr(p). In the normalized random measure case, Tr(-) 
is a transformation such that Tr(p(K)) = 1. A concise survey of other kinds of 
transformations can be found in [LPlOj . 

Taking different Levy measures u(dt, dx) of (jlj), we can obtain different NRMs. 
We use NRM(?7, M, H(-)) to denote the normalized random measure, where M is 
the total mass, which usually needs to be sampled in the model, and H(-) is the 
base probability measure, rj is the set of other hyper-parameters to the measure on 
the jumps, depending on the specific NRMs. In this paper, we are interested in a 
class of NRMs called normalized generalized Gamma processes: 



3 In this paper, we use [i to denote a normalized random measure, while use /2 to denote its 
unnormalized counterpart. 
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Normalized Generalized Gamma Processes Generalized Gamma processes 
are random measures proposed by Brix |Bri99j for constructing shot noise Cox pro- 
cesses. They have the Levy measures as 



e -bt 



v(dt,dx) = -j^H(dx),b > 0,0 < a < 1. (9) 

By normalizing the generalized Gamma process as in ([H]), we obtain the normalized 
generalized Gamma process (NGG). 

For ease of representation and sampling, we convert the NGG into a different form 
using the following lemma. 

Lemma 2 Let a normalised random measure be defined using Levy density 
v(dx, dt). Then scaling t by X > yields an equivalent NRM up to a factor. That is, 
the normalised measure obtained using z/(dx,dt/A) is equivalent to the normalised 
measure obtained using A u(dx, dt). 

By this lemma, without loss of generality, we can instead represent the NGG by 
eliminating the parameter b above. 

Normalized Generalized Gamma: The NGG with shape parameter a, to- 
tal mass (or concentration) parameter M and base distribution H(-), denoted 
NGG(a, M, #(•)), has Levy density M p a (dt)H(dx) where 



Pa{t) 



rri - a) t l+a 



Note that similar to the two parameter Poisson-Dirichlet process |PY97j . the 
normalized generalized Gamma process with a ^ can also produce power-law 
phenomenon, making it different from the Dirichlet process and suitable to model 
real data. 



Proposition 1 ([LM P07] ) Let K n be the number of components induced by the 



NGG with parameter a and mass M or the Dirichlet process with total mass M. 
Then for the NGG, K n /n a — > S a> M almost surely, where S a> M is a strictly positive 
random variable parameterized by a and M. For the DP, K n / \og(n) — > M. 

Figure |2] demonstrates the power law phenomena in the NGG compared to the 
Dirichlet process (DP). We sample it using the generalized Blackwell-MacQueen 
sampling scheme |JLP09] . Each data to be sampled can choose an existing cluster 
or create a new cluster, resulting in K clusters with N data points in total. 

Many familiar stochastic processes are special/limiting cases of normalized gen- 
eralized Gamma processes, e.g., Dirichlet processes arise when a — > 0. Normalized 
inverse-Gaussian processes (N-IG) arise when a = | and b = | . If b — > 0, we get 
the a-stable process, and if a — > and b depends on x, we get the extended Gamma 
process. 
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#data in each cluster 



Figure 2: Power-law phenomena in NGG. The first plot shows the #data versus 
^clusters compared with DP, the second plot shows the size s of each cluster versus 
total number of clusters with size s. 

Remark For the NGG, key formula used subsequently are as follows: 
Mv) = M((l + v) a -l) 

POO 

J Pa(dt) = \Q(-a,L)\ 
e-™p a (dt) = (l + v) a \Q{-a,L{l+v))\ 



L 



[ (1 - e"*) Pa (t)dt = ((l + v) a -l) + (l+v) a \Q(-a,L(l + v))\-\Q(-a,L)\ 
Jo 

where Q(x,y) = T(x,y)/T(x) is the regularized upper incomplete Gamma function. 
Some mathematical libraries provide it for a negative first argument, or it can be 
evaluated using 

Q(-a,z) = Q(l-a,z)- -— — -^e" 2 , 

1 (1 — a) 

using an upper incomplete Gamma function defined only for positive arguments. 

Finally, because probabilities for a NRM necessarily have the divisor 
p,(X) = J2h=i Jki an d thus likelihoods of the NRM should involve powers of 
fi(X), a trick is widely used to eliminate these terms. 

Latent relative mass: Consider the case where N data are observed. By intro- 
ducing the auxiliary variable, called latent relative mass, Un = Tjsr/jl(X) where 
V n ~ r(l, N), then it follows that 

j^^p(T N )dT N = exp | -U N J k | dU N 
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Thus the N-ih power of the normaliser can be replaced by an exponential term in 
the jumps which factorizes, at the expense of introducing the new latent variable v. 
To the best of our knowledge, the idea of this latent variable originals from [Jam05j 
and is future explicitly studied in [JLP061 IJLP091 IGWllj . etc.. 

2.3 Slice sampling normalized random measure mixtures 

Slice sampling an NRM has been discussed in several papers, here we follow the 
method in [ GW1 1], to briefly introduce the ideas behind it. It deals with the nor- 
malized random measure mixture of the type 



k=l 

6^~/i, Xi ~ g (-\9 Si ) (10) 

where Uk = Jk/ Y^ili Jh Ju Jv, ■ • • are the jumps of the corresponding CRM defined 
in ([3]), 9k 's are the components of the mixture model drawn i.i.d. from a parameter 
space H(-), denotes the component that x^ belongs to, and go(-\&k) is the density 
function to generate data on component k. Given the observations x, we introduce a 
slice latent variable Ui for each Xi so that we only consider those components whose 
jump sizes J^'s are larger than the corresponding itj's. Furthermore, the auxiliary 
variable Un (latent relative mass) is introduced to decouple each individual jump 
Jk and their infinite sum of the jumps YlkLi appeared in the denominators of 
Wfc's. For clarification, we list the notation and its description in Table [TJ Based 
on [GW11] , we have the following posterior Lemma. 

Lemma 3 The posterior of the infinite mixture model Hty) with the above auxiliary 
variables is proportional to 

p(6, Ji, • • • , J K , K, u, L, s, U n \x, NRM(r), M, H(-))) oc 
exp j-C/jv ex P {- M I i 1 ~ e ~ UNt ) ^(*) d *} 



K N 



u^ i p(j u , j K ) n h(e k ) n i(j Sl > u^oN^), (ii) 

k=l i=l 

where 1(a) is a indicator function returning 1 if a is true and otherwise, h(-) is the 
density ofH(-), J* = EfcLx+i J k, L = min{M} ; andp(J x , ■ ■ ■ , J K ) = f]f=i jp^(t) d t 
is the jump (large than L ) distribution derived from the underlying Poisson pro- 
cess (actually, J follows a compound Poisson process, meaning that it has K ~ 
Poisson(M J L °° p v (dt)) jumps, while each jump has density j£ V p J ^ dt , here Poisson(x) 
means Poisson distribution with mean x). 

The expressions for the NGG needed to work with this lemma were given in the 
remark at the end of Section 12.21 Thus the integral term in Equation ( TTTj) can be 
turned into an expression involving incomplete Gamma functions. 



12 



Normalized Random Measures 



Table 1: List of notation. 



Notation 


Description 


K 


^COMPONENTS WITH JUMP SIZES LARGER THAN A THRESHOLD L 


9 k ,k = 
1,— ,K 


Components in the mixture model 


M 


Total mass of the random measure 


Jk, k 
■ ,K 


Jump sizes of the random measure with all J k > L 


J* 


Sum of the remaining jump sizes, J* = Y1T=k+i Jk, Jk < L 


Vi,i = 
1) • • " ,N 


Observed data 


n k ,k = 
l,--,K 


#DATA ATTACHED TO EACH COMPONENT 


N 


TOTAL NUMBER OF DATA POINTS 


Si,i = 
1,---,N 


Variables indicating which component ?/j belongs to 


Ui,i = 
1, • • " ,N 


SLICE VARIABLE UNIFORMLY DISTRIBUTED IN (0, J S J FOR Iji 


L 


L = min{?7} 


U N 


An auxiliary variable introduced to make THE SAMPLING 

FEASIBLE 


9o{-\0k) 


Density function to generate data on component 9 k 


h{9 k ) 


Density of H(9 k ) 


Pm(M) 


Prior for M 


u(dt, dx) 


Levy measure of the random measure with decomposition 
u(dt, dx) = p v (dt)H(dx) considered in this paper 



2.3.1 Sampling: 

First, we denote the parameter set as C = y), Ji, • • • , Jk, K, u, L, s, Un, Mj, then 
the sampling goes as 

• Sampling s: From (flTT) we get 

p{si = k\C\{si}) oc l(J fc > Ui)g (yi\e k ) (12) 

• Sampling Un- Similarly 

p(U N \C\{U N }) oc C/^expj-C/Ar^Jfcj 

exp|-M^ [l-exp{-U N t}] Pri (dt)\, (13) 
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which can be sampled using rejection sampling from a proposal distribution 
Ga (n,Y2k=i Jkji nere Ga(a,b) means a Gamma distribution with shape pa- 
rameter a and scale parameter b. 

• Sampling 6: The posterior of 9 k with prior density h(9k) is 

p(9 k \C\{9 k })<xh(9) H g (y t \9 k ). (14) 

i\si=k 

• Sampling K, { Ji, • • • , Jk}'- Sampling for J k can be done separately for those 
associated with data points (fixed points) and for those that are not. Based 
on |JLP09j . when integrating out u in f JTTj) . the posterior of the jump J k with 
data attached (n k > 0) is proportional to 

Jl k exp {-U N J k } p v (J k ), (15) 

While for those without data attached (n k = 0), based on [GW11], conditional 
on Un-i the number of these jumps follows a Poisson distribution with mean 

/oo 
ex P {-U N t}p v (dt), 

while their lengths t have densities proportional to 

exp{-U N t}p v (dt)l(t > L). 



• Sampling u: u are uniformly distributed in the interval (0, J Si ] for each i. 
After sampling the u, L is set to L — min{u}. 

• Sampling M: The posterior of M with prior pm{M) is 

f r /-oo pL I " 

p(M\C\{M}) oc pm(M)M k exp < —M / p v (dt) + / [1 - exp {-L^t}] p,(dt)( 1 g) 

L Ul Jo J , 

Pm(M) is usually taken to be Gamma distributed, so the posterior of M can 
be sampled conveniently. 



3 Operations 

This section introduces the dependency operations used. These are developed for 
Poisson processes, CRMs and NRMs. 
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3.1 Operations on Poisson processes 

We review three operations that transform Poisson processes in order to construct 
dependent completely random measures. For details, refer to |Kin93t iLGFlOj . 

Superposition of Poisson processes Given a set of Poisson processes 
ni,n 2 ,--- the superposition of these Poisson processes is defined as the 

union of the points in these Poisson processes: 

n 

n : =|Jn, (17) 

Lemma 4 (Superposition Theorem) Let ilx, ■ ■ ■ ,U n be n independent Poisson 
processes on § with Uk ~ PoissonP(vk), then the superposition of these n Poisson 
processes is still a Poisson process with IT ~ PoissonP(J2i v i)- 

Subsampling of Poisson processes Subsampling of a Poisson process with sam- 
pling rate q{9) is defined to be selecting the points of the Poisson process via inde- 
pendent Bernoulli trials with acceptance rate q{9). 

Lemma 5 (Subsampling Theorem) Let II ~ PoissonP(u) be a Poisson process 
on the space § and q : § — > [0, 1] be a measurable function. If we independently draw 
z e G {0, 1} for each 9 ell with P(z e = 1) = q{9), and let U k = {9 G II : z g = k} for 
k = 0, 1, then Ho, 111 are independent Poisson processes on § with S' 1_<? (I1) := Ho ~ 
PoissonP((l — q)u) and S q (Y\) := IIi ~ PoissonP(qv) . 

Point transition of Poisson processes Point transition of a Poisson process II 
on space (§,5), denoted as T(I1), is defined as moving each point of the Poisson 
process independently to other locations following a probabilistic transition kernel 
T, which is defined to be a function T : S x S — >■ [0, 1] such that for each 9 E S, 
T(9, •) is a probability measure on E that describes the distribution of where the 
point 9 moves, and for each A e S, T(-,A) is integrable. Thus, T(I1) := {9' : 9' ~ 
T(6>, -)\9 G II}. With a little abuse of notation, we use T{9) to denote a sample 
from T(9, ■) in this paper. Thus T{9) is a stochastic function. 

Lemma 6 (Transition Theorem) Let II ~ PoissonP(v) be a Poisson process on 
space (§, <S), T a probability transition kernel, then 

T(n) ~ PoissonP{Tv). (18) 

where Tv can be considered as a transformation of measures over § defined as 
(Tv)(A):=f s T(e,A)v(de)forAe§. 



4 In the following we will use T(-) to denote the point transition operation, while use T(-, •) to 
denote the corresponding transition kernel. 
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3.2 Operations on random measures 
3.2.1 Operations on CRMs 

The dependency operations defined on Poisson processes in Section 13.11 can be nat- 
urally generalized to the completely random measures given the construction in ([3]). 
Formally, we have 

Superposition of CRMs Given n independent CRMs pL\, • • • ,/i„ on X, the su- 
perposition (©) is defined as: 

/2l©/i2© • • • ©An := /ii + /i2 H h /x n • 

Subsampling of CRMs Given a CRM jl = Y2h=i Jk$e k on X, and a measurable 
function q : X — > [0, 1]. If we independently draw z(9) G {0, 1} for each 9 G X with 
p(z(9) = 1) = q{9), the subsampling of /i, is defined as 

:=Y,*W*W9u> ( 19 ) 
k 

Point transition of CRMs Given a CRM jl = YlkLi Jk5g k on X, the point tran- 
sition of jl, is to draw atoms 9' k from a transformed base measure to yield a new 
random measure as 



k=l 



3.2.2 Operations on NRMs 

The operations on NRMs can be naturally generalized from those on CRMs: 

Superposition of NRMs Given n independent NRMs /ii, • ■ ■ ,/x n on X, the su- 
perposition (©) is: 

Hi © H2 © ■ • • © A*n := ci/ii + c 2 /i 2 H h c„/i„ . 

where the weights c m = ^"l^L and u m is the unnormalized random measures 
corresponding to /i m . 

Subsampling of NRMs Given a NRM // = X^jbU on X, and a measurable 
function g : X — > [0, 1]. If we independently draw z{9) G {0, 1} for each 9 G X with 
p(z(9) = 1) = q(9), the subsampling of /i, is defined as 
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Point transition of NRMs Given a NRM \i = Y2T=i r k&e k on X, the point tran- 
sition of /i, is to draw atoms 0' k from a transformed base measure to yield a new 
NRM as 

oo 
k=l 

The definitions are constructed so the following simple lemma holds. 

Lemma 7 Superposition, subsampling or point transition of NRMs is equivalent to 
superposition, subsampling or point transition of their underlying CRMs. 

Thus one does not need to distinguish between whether these operations are on 
CRMs or NRMs. 

4 Posteriors for the NGG 

This section develops posteriors for the single NGG, for a standard version 
p (x\NGG(a, M, H(-) j and a version conditioned on the latent relative mass Un, 

p (x\Un, NGG(a, M, H(-)^j. The second version is done because, as shown, the first 
version requires computing a complex recursive function. 

4.1 Simple Posterior 

James et al. [JLP09j develop posterior analysis as follows. This theorem simplifies 
their results and specialises them to the NGG. 

Theorem 2 (Posterior Analysis for the NGG) Consider the 
NGG(a, M, H(-)) . For a data vector X of length N there are K distinct val- 
ues Xl,...,Xx with counts ni,...,riK respectively. The posterior marginal is given 
by 

/ x e M a K ~ 1 T N ' K K 
p(X\NGG(a,M,H(.)) = r(iV) °' IB 1 " a )n k -ih(X* k ) . (21) 

where 

Moreover, the predictive posterior is given by: 

K 

p(X N+1 edx\X,NGG{a,M,H{-)) = u H{dx) + (dx) 

k=l 
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where the weights sum to 1 f^2k=o Uk = 1) are derived as 

rpN+l,K+l 



^0 oc a- 1R 

I a,M 

u k oc (n k - a) (23) 



Note that an alternative definition of is 

M K r n N - 1 

T N,K _ M_ / U M-M(l+u)° H 

and various scaled versions of this integral are presented in the literature. Intro- 
ducing a r(b/a, 1) prior on M and then marginalising out M makes the term in 
e M-M(i+u) a di sa pp ear since the integral over M can be carried inside the integral 
over u. 

Corollary 1 Let /2 ~ NGG (a, M, H(-)) and suppose M ~ T(b/a, 1) then it follows 
that ft ~ PDP(a,b,H(-)) 

For computation, the issue here will be computing the terms T a , M . Therefore 
we present some results for this. 

Lemma 8 (Evaluating T^f:) Have T^' M defined as in Theorem Then the 
following formula hold: 

T™ < T(K, M) , (24) 

n=0 ^ ' V a) 

T a N £ +2 = K T^ K + ^1 fef - T^) , ViV > 2,K G N+ (26) 

where T(x,y) is the upper incomplete gamma function, defined for y > and a; ^ 
0,-1, —2, .... Moreover, for Equation ( OS]) ,, A;a cannot be integral for k = 1, K — 1. 
Another recursion is needed when a = 1/R for some R G N + , i? > 1. Then 

T a N ^ K = T%g - M^T a N £~ l/a , > i?, N, K G N+ , (27) 

It can be seen there are two different situations. When a = 1/R for some R G 
N + ,R > 1, then one can recurse down on N. But otherwise, one recurses down on 
K. Moreover, T a , M is a strictly decreasing function of N and M, but an increasing 
function of K and a. For computation, Equation ( 12"5|) can be used to compute 
and T aJ ^ in terms of T (l — This equation may not be usable for K > 2 and 
may be unstable. Thereafter, for K > 2 in T a jj^ the recursion of Equation f|26|) can 
be applied. 
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Remark The Poisson-Dirichlet Process and Dirichlet Process are well known for 
their ease of use in a hierarchical context [TJBB061 IUDB111 IBHT2] . The NGG has 
the same general form. 

The major issue with this posterior theory is that one needs to precompute the terms 
T a ' M . While the Poisson-Dirichlet Process has a similar style, it has a generalised 
Stirling number dependent only on the discount a [BH12]. The difference is that 
for the PDP we can tabulate these terms for a given discount parameter a and still 
vary the concentration parameter (b above, but corresponding to M) easily. For the 
NGG, any tables of T^' M would need to be recomputed with every change in mass 
parameter M. This might represent a significant computational burden. 



4.2 Conditional Posterior 

James et al. |JLP09] also develop conditional posterior analysis as follows. This 
theorem simplifies their results and specialises them to the NGG. 

Theorem 3 (Conditional Posterior Analysis for the NGG) Consider the 
NGG a ^i and the situation of Theorem [H The conditional posterior marginal, 
conditioned on the auxiliary variable Ufj, is given by 

p(x\U N = u,NGG(a,M,H(.)),N) = ^^Vl ^ f[(l-a) n ^h(Xt) ■ 

V ' £*=1 S k,a ( Ma I 1 + U ) ) k=l 

(28) 

Moreover, the predictive posterior is given by: 

K 

p(x N+1 edx\X,U N = u,NGG{a,M,H{-)),N^ = Lu H(dx) + J^u; fe 5 x *(dx) 



k=l 



where the weights sum to 1 f^2 k=0 0Jk — 1) are derived as 

u oc Ma (1 + u) a 

Uk oc nk — a . (29) 

The posterior for Un is given by: 

/ \ aM K n N ~ l 
p(U N = u\X,NGG(a,M,H(.)),N) =~ W k~ rw^e-^ 1 ^ . (30) 

A posterior distribution is also presented by James et al. as their major result of 
Theorem 1 |JLP09) . We adapt it here to the NGG. 

Theorem 4 In the context of Theorem\^the conditional posterior of the normalised 
random measure jl given data X of length N and latent relative mass Un = u is 
given by 



T J+ K 



/i 



T + J + - T+J +l _ { 
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where 



ft ~ JVGG^ a ,_^_ jjH -(.)^ 



T ~ / T (i) w/iere Levy measure of f T (t) = =J^-^s a x e ( 1+n ) s 

r(l — a) 

J+ ~ r(JV - Ka, 1 + u) , 
p ~ Dirichletx (n — a) . 

Here, ft' , J + and p are jointly independent andT, J + and p are jointly independent. 

Note in particular the densities given for \S and T are not independent from each 
other. While an explicit density is not given for T, its expected value is easily 
computed via the Laplace transform as Ma(l + -u) a_1 . 

Griffin et al. |G Wllj present an alternative technique for obtaining the condi- 
tional posterior. The following is adapted from their main sampler after integrating 
out the slice variables. 

Theorem 5 (Sampling Posterior) Consider a bound < L < oo which is suffi- 
ciently small so that it is less than the jumps Jk associated with all the observed data. 
For an NRM given by NRM(t], M, H(-)), the number of jumps Kl with value Jk > L 
is a random variable as well as their values J%, Jk l - The resultant posterior is as 
follows: 

p(x,U N = u\ K L , Ji, J Kl ,N, NRM(rj, M, #(•)) 



N-l _M/ ( f(l-e-«« ) Pv (s)ds 



K 



L 



= u 

k=l 

where are the unique data values (from X ) and are the count of data from X 
having the value X£ . 

The expressions for the NGG needed to work with this lemma were given in the 
remark at the end of Section 12.21 We further simplify this by marginalising out 
jumps Jk and then taking the limit as L — > 0. Note we have renumbered indexes so 
that nk > for all k = 1,...,K where K < Kl. This matches the conditionals of 
Theorem E] so is seen to be correct. 

Corollary 2 (Reduced Sampling Posterior) In the context of Theorem as- 
sume there are K jumps with attached data such that > 0. The resultant posterior 
is as follows: 

p (x,U N =u,K\N, NGG (a, M, H(-)) 
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Moreover, 



P 




-a-i e -(i+«)^( X *) . 



(32) 



Remark With the use of the latent relative mass U N , the NGG lends itself to 
hierarchical reasoning without a need to compute the recursive series T^m . This 
can be done with either the jumps integrated out, or the jumps retained. 

5 Dependencies and Properties of Operations 

This section presents a number of results to do with the operations applied to the 
NRMs. First dependencies such as covariances are presented. Then some further 
properties are developed for when the operations are used in a network. 

5.1 Dependencies between NRMs via Operations 

Properties of the NRMs here are given in terms of the Laplace exponent and its 
derivatives. In the Dirichlet process case, we have ip{v) = Mlog(l + v), while in 
the normalized generalized Gamma process case, we have ip a { v ) — M ((1 + v) a — 1). 
Because the dependencies involve the total masses significantly, we use a modified 
version of the Laplace exponent in all these results. Define if) v (v) = j^ip v (v), which 
has the mass removed. 

Different from the Dirichlet process, the total masses M are no longer indepen- 
dent from their normalized jumps in general normalized random measures. However, 
we can still derive the correlations between different NRMs. The following Theorems 
summarize these results. 

Lemma 9 (Mean and Variance of an NRM) Given a normalized random 
measure fj, on X with the underlying Levy measure v(dt,dx) = Mp v (dt)P(dx), for 
\/B G £>(X). The mean of this NRM is given by 



E\pL(B)] = P(B) . 



(33) 



The variance of this NRM is given by 



Var(/i(B)) 



P(B)(P(B) - l)M 




(34) 
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Remark For DP, the corresponding variances are: 

y UBpMB)) = 

For NGG, it is 

Var ArGG ( / i(S)) = P(P)(1 - P(P))— e M M^|r(-i M)\. 

a a 

For large M the upper incomplete gamma function used here has the property that 
e M M 1+ ^|r(-i, M)| ->■ 1 and so we get for large M 

VarjvooMS)) P(P)(1 - P(B))^ ■ 

Theorem 6 (Dependency via superposition) Suppose fii,i = 1, • • • ,n are n 

independent normalized random measures on X £/ie underlying Levy measures 
Ui(dt,dx) = M i p r] (dt)P(dx) , let /i — /ii © • • • © /!„, P G jB(X), £/ien £/ie covariance 
between /i k (k < n) and \i is 

Cov{n k {B),n{B)) = 

P{B)M k j 7 (M fc , P(P), u) exp i -QT Mj)ijj v (v) I dt; 



2 £* fc 



where 



7 (M k ,P(B),v)= (36) 
{p{B)M k ^{ Vl ) 2 - j%(vS) exp {-MA( Vl )} d Vl 

Theorem 7 (Dependency via subsampling) Lei jl be a completely random 
measure on X with Levy measure v(dt,dx) = Mp v (dt)P(dx) , p = -~^y. TTie co- 
variance between p and its subsampling version S q (p), denoted as p q , with sampling 
rate q(-) on B G B(X) is 

Cov(p q {B),p(B)) = 
P(B)M q J 7 (M„ P(P),^) exp | — (M - M,ty» } dt; 

+ m? (^) , (37) 

where M q := (qp)(X) = f x q(x)p(x)dx. 
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Theorem 8 (Dependency via point transition) Let jibe a random measure on 
X with Levy measure v(dt,dx) = Mp v (dt)P(dx) , /i = Let B G £>(X) ; A = 

T(B) := {x : x ~ T(y,-),y G -B} be the set of points obtained after the point 
transition on B, thus P(A) = j B P(T(x))dx. Suppose A and B are disjoint (which 
is usually the case when the transition operator T is appropriately defined), the 
covariance between fi and its point transition version T(n) on B G i3(X) is 

Cov(fi(B), (Tfj)(B)) = P(A)P(B) (38) 

poo pv\ 
r2 / / „7> („. \2 



Jo Jo ^ v ^ e ^{- M ^ v 2))dv 2 dvi-l 

5.2 Properties of the three dependency operations 

We first prove the following two Lemmas about superposition and subsampling of 
CRMs. 

A straightforward extension of Theorem 1 [JLP09J leads to the following Lemma 
about the posterior of CRMs under superposition. 

Lemma 10 (Posterior of CRMs under superposition) Let fix, fa, • ■ ■ , fi n be 

n independent CRMs defined on space X, with Levy measures fj(dt, dx) for i = 
1, • ■ ■ , n. Let 

jl = ®UK (39) 

Then given observed data X = {Xi} (we use X£ to denote the distinct values among 
X) and a latent relative mass U n , the posterior of fi is given by (we use x\(y) to 
denote the variable x conditioned on y) 

K 

jl\(U n ,X) = fi\(U n ) + J ^xp (40) 



fc=i 



where 

1. fi\(U n ) is a CRM with Levy measure 



u(dt, dx) = e~ ut I u i( dt i dx ) 



8=1 



2. XI (k = 1, • • • , K) are the fixed points of discontinuity and Jk 's are the cor- 
responding jumps with densities proportional to 



t n ke -ut (J2vi(dt,dx) 



where is the number of data attached at jump Jk- 
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3. ji\(U n ) and Jk's are independent. 

By subsampling, we can prove the following formula of the Levy measure under 
subsampling. 

Lemma 11 (Levy measure under subsampling) Let ji = YlkLi ^ k ^ x t ^ e a 
CRM with Levy measure u(dt, dx). Let S q (jl) be its subsampling version with accep- 
tance rate q(-), then S q (ji) has the Levy measure of q(dx)u(dt, dx). 

Now we give some properties about compositions of of the dependency operations 
which follow simply. 

Lemma 12 (Composition of dependency operators) Given CRMs ji, fx' and 
ji" , the following hold: 

• Two subsampling operations are commutative. So with acceptance rates q(-) 
and q'(-), then S q '(S q (ji)) = S q (S q '(ji)). Both are equal to S q ' q (ji)). 

• A constant subsampling operation commutes with a point transition operation. 
Thus S q (T(ji)) = T(S q (fi)) where the acceptance rate q is indepenent of the 
data space. 

• Subsampling and point transition operations distribute over superposition. 
Thus for acceptance rate q(-) and point transition T(-), 

S q (jl © /}') = S q (jl) © SV) , T ^ ® AO = ® T W) ■ 

• Superposition is commutative and associative. Thus Ji © ji' = ji' © ji and 
(ji © ji') © ji" = ji © (ji' © ji"). 

Thus when subsampling operations are all constant, a composition of subsampling, 
point transition and superposition operations admits a normal form where all the 
subsampling operations are applied first, then the transition operations and lastly 
the superposition operations. 

Lemma 13 (Normal form for compositions) Assume subsampling operations 
all have a constant acceptance rate. A normal form for a composition of subsampling, 
point transition and superposition operations is obtained by applying the following 
rules until no further can apply. 

s q (s q, m s qq \ji)) , 

S q (T(ji)) -> T(S q (ji)), 
S q (fi®fi') -> S q (fi) © S q (ji') , 
T(ji © ji') -> T(ji) © T(ji') . 

The remaining top level set of superpositions are then flattened out by removing any 
precedence ordering. 
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Note that Lemmas [lOj HH EH and [13] all apply to NRMs as well due to Lemma [71 
Now it is ready to state the main theorem about the relation between the CRM and 
the corresponding NRM under the three dependency operations. 

Theorem 9 (Equivalence Theorem) Assume the subsampling rates q(-) are in- 
dependent ( constant^ for each point of the corresponding Poisson process, the fol- 
lowing dependent random measures ffty an d fflty are equivalent: 

• Manipulate the normalized random measures: 

/4 ~ TiS^'m-x)) © M™, for m>\. (41) 



Manipulate the completely random measures: 

f(S q (fl , m _ 1 )) © /2 m , form>\. 



f"m ~ ~, /^r\ i (42) 



Furthermore, both resulting NRMs \J m 's are equal to: 

4r E^=i (<r~ y /V) ( x ) 

where q m ~i jl is the random measure with Levy measure q m ~^(dx)i , (dt,dx), and 
i/(dt, dx) is the Levy measure of fx. T m ~^{^) denotes point transition on fi for 
{m — j) times . 

In the posterior sampling for subsampling operation, we can prove the following 
posterior of the Bernoulli variables. 

Theorem 10 (Posterior acceptance rates for subsampling) Let pi! = 

^2kJk@k be a completely random measure on X, ft = S q (jl') := J2k z kJk°~k be its 
subsampling version, where Zk 's are independent Bernoulli random variables with 
acceptance rate q. Further define /x = Given n = Ylk n k observed data in fi, 

the posterior of Zk is: 



1 ifn k >0, 
. ?/J+(l-g)/J- fe tf n k = Q- 



p(z k = l[/2,n) = { q/J . t _ n ' (43) 



where J = (J2 k , z k >J h <) n , J k = (j2 k '^k z k'Jk^j ■ 



5 This assumption is to deal with the case when considering point transition, meaning we can 
drop this assumption if no point transition operation is considered. 
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Corollary 3 (Posterior acceptance rates in sampling J' mk in Section 4 [CDB 12]) 

Using the terminology as in Section 4 ICDB1BJ . the posterior p{z m k = M^m, {^mfe}) 
is computed as: 

• tfKnk > °; then P( z mk = Mfim, {n' mk }) = 1. 

• Otherwise, 

q m - m '/J m 



p(z mk = l\fl m , {n' mk }) 



q m - m '/J m + (1 - q rn - m ')lJ, 



fe' 

rn 



Where J m - (Y.m>< m J2k> Z m>k>Jm>k>) m ' , Jm - (j2m'<mJ2k'^k Z m'k'Jm'k' 

and h' m . = Efc'^mifc'- 
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A Proofs 

Proof of Lemma [2] We have v(dx,dt/X). Doing a change of variables t' = t/X 
and some rearranging of the Levy-Khintchine formula yields the following: 



E r e -/x(W)(£(d*)/A) 



-/ B+xX (l- e -*'C*/M))AKd*,dt') 



Since jl(dx)/X normalises to the same measure as £i(dx), and saying something holds 
for any f(x) is the same as saying something holds for any Xf(x) (when A > 0), the 
result follows. 

Proof of Lemma [3] First, for the infinite mixture model, we have infinite num- 
ber of components, thus given the observed data {x\, ■ ■ ■ , xjy) and their allocation 
indicators s, the model likelihood is 

n Js 

f^x, s\9, J) = JJ -y-g (xi\6 Si ), 
i=i J+ 

where J+ = YlkLi Now introduce the slice auxiliary variables u for each data, 
such that we only consider the components whose jumps are larger than a threshold 
Ui for data x±, in this auxiliary space we have 

1 N 

/„(£, u,s\9,J) = - w Y[l(u l < J Si )g (xi\9 Si ). 



J + i=i 



Now using the fact that 



1 _ ^U^~ l e^{-U N J + }dU I 



N 



J? T(N) 
after introducing the auxiliary variable Un, we have 

N 

f,j,{x,u,s,U N \9,J) oc l^ -1 exp {-L/at J+} l(ui < J Si )g (xi\9 Si ). 

i=i 

Further decomposing J + as 

K 

k=l 

where K is the number of jumps which are large than a threshold L, J* 
Y,T=k+i J k, then we get 



fn(x, u, s, U N \9, Ji, • • • , J K , K) 



N 



OC 



U n N ~ l exp I -U N Jk \ E [exp {-U N J*}] ]J l( Ui < J 8i )go{xi\6 Si ). (44) 



fc=i J «=i 
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Now use the Levy-Khintchine representation of a Levy process (J4]) to evaluate 
E [exp {—U N J*}], we get 



K 



f^x, u, a, U N \6, Ji, • • • , J K , K) oc f/^" 1 exp <^ -U N ^ J k 

I fc=i 

,£ 1 * 

-M / (l-expH4)pi,p TTl(«i< J s .)^ (^|^). (45) 
./o J i=1 



exp 

Now combining with the priors 



p(ji, • • • , .7*:) = n 



i JTW*) d *' 

A,(dt)), ~ ^(0*), 



the result follows. 



Proof of Theorem [2] The definition for r n (u) comes from [Proposition 1] |JLP09"] . 
The posterior marginal of Equation ff2T]) comes from [Proposition 3] |JLP09] and is 
simplified using the change of variables t — M (1 + u) a . For the predictive posterior, 
the weights in Equation ( 1231) are derived directly from the posterior. The posterior 
proportionality for p{U^ = u\X, NGG(a, M, H(-))) discards terms not containing u. 

Proof of Corollary Q] Marginalise out M from the posterior of Equation (l2Tj) us- 
ing the alternative definition of T a ^ " ■ It can be seen this yields the posterior of a 
Poisson-Dirichlet distribution with discount parameter a and concentration parame- 
ter b. Since the posteriors are equivalent for all data, the distributions are equivalent 
almost surely. 

Proof of Lemma |8] Equation ( 1241 holds by noticing 1 — (y-) 1 ^ < L To prove 
Equation (125]) . first prove 

<'m = E ( N ~ i- Ml/a T r { K - M ) ViV ' K G N+ ■ ( 46 ) 

n=0 V n / a 

This holds by expanding the term ( 1 — (^) j using the binomial expansion 

and absorbing the powers l/t n / a into the t K ~ l as an incomplete Gamma integral. 

Now manipulate Equation (146p . Expand Y [K — -,MJ using the recursion for 
the incomplete gamma function, which can be applied when K — - ^ 1, yields 



rK--i-? e -M 



e(V)(-«"t((--^(-^)«' 

n=0 x 7 

e f; l ) i-^r (*-- i -s) r (* - 1 - - ^ e ( w 

n=0 v 7 n=0 v 
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The second sum is a binomial expansion of (1 — l)^ -1 and therefore disappears. 
Apply this step repeatedly to get Equation (1231) . Note for the chain of expansions 
to be done, it must be the case that k — - 7^ 1 for n = 0, N — 1 and k = 2, K, 
so ^ 7^ k for n = 0, N — 1 and k = 1, K — 1, so a 7^ r for n = 1, N — 1 and 
= 1 , . . . , K — 1 , so fca cannot be integral for k = 1, K — 1 . 
Equation (126]) holds by applying the integration by parts formula on the terms 

/ l/a\ N ~ 1 

A(t) = ( 1 — (^) ] and B(t) = t KJrX e~ l and rearranging the resultant integrals 

using (y-) 1 ^ = 1 — fl — (t) 1 ^) ^° arr i ve back at terms representable. Note that 
A(t)B(t)Q = 0. Equation (T27]) holds by expanding 

(7V+1)-1 / , / \ N-l / 1 / \ iV-1 , / 

^f) " -(f) (f) 




inside the integral definition of T a 



N+1,K 
M 



Proof of Theorem [3] The posterior marginal of Equation ( 128]) comes from [Propo- 
sition 4][JLP09j. Although the denominator is difficult to evaluate, and it can be 
derived through a recursion, the easiest way is simply to normalise the renumerator. 
Sum over (Ma (1 + u) a ) K nf=i(l ~ a )n k -i for all length K partitions (n 1; n 2 , ■■■,n K ) 
yields (Ma (1 + u) a ) K a and the result follows by again summing over K. The 
predictive posterior, as before, follows directly from the posterior marginal. The pos- 
terior proportionality for Un, p(Un = u\X, NGG(a, M, if (■))), comes from [Propo- 
sition 4] [JLP09J after discarding terms not containing u. The normalising constant 
is obtained using the methods of Theorem [2J 

Proof of Theorem [5j This comes from |GW11| at the end of Section 3, and in- 
cludes the prior on K^, J 1; J^ L described in Section 4. The mixture model com- 
ponent k(yi\8 Si ) has also been stripped and the slice sampling variables marginalised 
out. 

Proof of Corollary [2j Equation ( 13T1) can be seen to hold true since conditioning 
it on Un = u and X yields respectively Equation (125]) and Equation (130]) . 

Prove Equation (132]) as follows. Now the likelihood p(Xi, ...,Xn \ fl) is given by 
Ylk=i Jk k where T = Y^k=x Jk, the total sum of jumps. We first simplify this using 
a latent relative mass variable. Introduce the variable U = 7/T for 7 ~ T(N, 1), 
Adding the term p( 7 )d 7 to the likelihood and making a change of variable using 
U — 7/T, yields 

p( 7 )d 7 = ^(UT) N -h- UT TdU 

TjN-l p -UT 

- tN -twt« u 
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Thus 



p(X 1 ,...,X N ,U\fl) 



IJN-I e -UT 

r(iv) 



K 

n 

k=l 



-uj k jn k 



where T is the total of jumps for the unobserved data. Now while the prior for the 
jumps p a ,M(t) is unnormalised, with observed data it becomes normalised. Thus 

p (X 1 , ...,X N , U, J x , ...,J K \ T , NGG (a, M, H(-))) oc 



uN -l e -UT 



K 



k=l 



Taking the expectation of e UT ° over the remainder term of the measure jl corre- 
sponds to the Levy-Khintchine formula, and thus 

p (X 1; ...,X N , U, J u J K I NGG (a, M, #(•))) oc 



TjN-l -M({l+u) a -l) K 

— Y[ e - {1+u)J * Jl k ~ a - l H(Xl 

k=l 



r(N) 



Adding in the terms for p (Xi, ...,X N , U | NGG (a, M, H(-))) yields the results and 
reveals the normalisation constant. 



Proof of Lemma H2 This uses a similar technique to that of Theorem 1 in jGKSll 
Using the identity 1/b = f °° e~ vb dv we get 



E \ji(B)} = E 



[jl(X) 

E [jl(B) exp {-vjl(B)}} E [exp {-vfl(X \ B)}] dv . (47) 

According to the Levy-Khintchine representation of jl and definition ([6]), we have 

(48) 



(49) 



E [exp {-vfl(B)}] = exp <j -P(B)Mi; v (v 

d 



E \jl(B) exp {-vfi(B)}} = -E 



dv 



exp{-vfi(B)} 



P{B)M$Jv) exp \ -P{B)M$ v (v 



E \ji(B) 2 exp {—vfi(B)}] =E 



— exp{-vfi(B)} 



P(B) 2 M 2 (^(v)) 2 - P(B)M^{v)j exp {-P(B)M4> v (v)} (50) 

Substituting ( 148]) and ( 149]) into (f4"T]) and using the fact in ([7]), after simplifying 

we have 

E\fi(B)] = P(B). 



32 



Normalized Random Measures 



Since Var (p(B)) = E [/i(5) 2 ]-(E [/i(5)]) 2 , and the last term is equal to (P(B)f 
we now deal with the first term. 



E [/i(5) 2 ] = E 



o Jo 



E [fi{Bf x exp {-vxfi(X) - ua/i(X)}] dvidw 2 



(51) 



JO 



E [jl(B) 2 exp {-( Vl + v 2 )jl(B)}] E [exp {-( Vl + u 2 )//(X \ 5)}] dv x dv 2 



Substituting (Hg]l (l3Dl) into §3) we have 



o Jo 



P(B) 2 M 2 V>i + «2) ) - P(B)M^( Vl + v 2 ) 



exp < -Mip^vi + v 2 )\ &vi&v 2 ■ 



(52) 

Furthermore, let v = V\ + v 2 , B = X in fl50|) . after integrating out v±, v 2 in [0, 00], we 
have 



Jo 



M 2 (ijjfai + w 2 )) 2 exp {-M^(«i + w 2 )} dwidw 2 



(53) 



=1 + / y M$J(vi + u 2 ) exp I -Mip v ( Vl + v 2 ) I dwidi; 2 



Substitute ( 1531) into ( J52l) and simplify we get 
Var(/i(fl)) = 



P{B){1 - P{B))M 



Jo 



V'Jfa + «2) ex P \ -M^Vi + v 2 ) \ dv idf 2 .(54) 



Now use a change of variables, let v[ = vi,v' 2 = vi + v 2 and simplify we get the 
result of (El. 



Proof of Theorem [6] From the definition we have 



Cov(ti k (B),fi(B)) = J2 Cov 



M 



i=l 



Hi{B),n k {B) 



Cov 



E 



M, 



^k(B), l^k(B) J + ^ Cov ( 



£./2,(X) Wx 



E 



jik(B) 
E 7 -£;(X) 



EjMj 

' h(B) 

./ifc(X) 



^(S),^ fc (B) (55) 



E 



fii{B)fL k {B) 



EjAjW J /ifc(X) 



E 



E 7 Ai(x) 



E 



Afc(X) 
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Note that for the Dirichlet process, the last n — 1 terms of (|55|) vanish because 
/ij's are independent from their total mass Mj's, but this is not the case for general 
NRMs. Now we calculate these term by term. 

For the first term, we have 



E 



^■(X) WX) 



E 



o Jo 

OO POO 



p, k {B) 2 exp I -vi(*>2fij)QC) -^Afc(X) 



dv idv 2 



o jo 



E [jl k (B) 2 exp {-fa + ^2)^.(5)}] E [exp {-( Vl + w 2 )/i fc (X \ 5)}] 



E 



dt>idf 2 



PtBfMffi^y - P{B)M k ^{v 1 ))ex V -M k ^ v ( Vl ) 



^0 



exp <^ -(y^Mj)^^) > dwidw 2 



P(B)M k jf 7 (M fc , P(P), u) exp j -(^ Mj)^) n {v) j di; 



(56) 



For the second term, we have 



E 



E 



/2 fc (P) 
Jtjt(X) 



P(P) / E 
'o 



/i fc (.B) exp | -u /2j (X) ^ dw 
P{B) 2 M k I ^(^expi-^M^^dr 



P(P) 2 M fc exp - [J2 j M j ; ^(0) 



P(P) 2 M fc 



(57) 
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For the third term, similarly 
fli{B)il k {B) 



E 



OO POO 



E 



JO 

OO POO 



pLi(B)p, k (B) exp <^ -^(^/^(X) - v 2 fl k (X) 



dfidf 2 



o Jo 



E [/} fc (B) exp {-(«! + u 2 )/iA(S)}] E [exp {-(^ + ^)/2 fc (X \ £)}] 



E [/2i(S) exp {— fi/ii(-B)}] E [exp {-^(X \ £)}] E 



P(B)M k 4>' v ( Vl + v 2 ) exp {-M fc ^(«i + *; 2 )} 



df idf 2 



o ./o 



P{B)M i il)' r) {v 1 ) exp <^ -Mj^(f i) 



exp < -( Mj)^(vi) > dwidw 2 
P(B) 2 M t M k [ $' ( Vl ) exp | -( V Mj)^ v (vi) 



P(B) 2 M, 
P(B) 2 M, 



2 ■ - 1 1 



i>' v (v 2 ) exp |-Af fc ^(u2)| df 2 dwi 
exp<j-(^M,)^(0) 



The fourth term is similar to the second term, and is equal to 



(5f 



E 



MB) 

E,ft(x) 



E 



' MB) 

.Afc(x). 



P( J B) 2 M i exp{-(E J M i )^(0)} 

E^ 

PjBfM, 
E 3 M, 



(59) 



The result follows. 

Proof of Theorem [7] By subsampling, we obtain two independent NRMs yfl and 
/Xq, corresponding to those points selected and those rejected by the independent 
Bernoulli trials, respectively. 

We denote the total mass of the corresponding unnormalized fi q as M q , and M° 
for Hq. From the definition of subsampling, we have 

M q := (<?/}) (X) = / q(x)p,(x)dx, 
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M q = M — M q . 



Furthermore, notice that the original NRM \i is the superposition of fi q and /ig. 
Thus according to Theorem [BJ the covariance between [i and yfl is 



P(B)M q J™ 7 (M„ P(P), u) exp j — (M - M 9 )^„(v) } dv + P{B) 



, , 2M, - M 



Proof of Theorem [8] Note that p, and are not independent, thus they can not 
be separated when taking the expectation. Now let A and B are defined as in the 
theorem, then: 



E[//(P) ((!>)(£))] =E 



/2(X) /2'(X) 



E 



g(g)gOjj 

_/i(X) /2(X)_ 



E [/i(P)/i(A) x exp {-(^ + u 2 )/2(X)}] dwidw 2 



o </o 

oo /-oo 



^0 



JO 



E\Jl(B)exp{-(v 1 + v 2 )fl(B)}} 

E[jl(A)exp{-(v 1 + v 2 )jl(A)}] 

E [exp {-( Vl + u 2 )/i(X/{4 U B})}} d Vl dv 2 

P{B)M^{v x + v 2 ) exp |-P(B)M^(vi + u 2 
P(A)M^'(v 1 + v 2 ) exp {-P(A)M^(t; 1 + v 2 



P(X/{A U fi})M^;( Wl + v 2 ) exp |-P(X/{A U B})M^{v x + v 2 ) j d^di; 2 
P(A)P(B)M 2 j ^(v. fexp^-M^v^dv^ 



'0 ./0 

Then the covariance is: 



Cov(//(P),CZ>)(P)) 

E [/i(P) ((7»(P))] - E fr(B)] E [(7»(P)] 
P(A)P(P) 



M Jo Jo ^ v ^ 2 ^{- M ^2))dv 2 dv 1 -l ) j (60) 

Proof of Lemma 1101 From the existing of Poisson processes, each Levy measure 
Vi(dt, dx) corresponds to a Poisson random measure Ni(dt,dx) with 

E [Ni(dt, dx)] = Vi(dt,dx). 

Also we have Wi, 

poo 

jli(dx) = / tNi(dt,dx). 



o 
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Thus from (|39|) we have 

POO 

fl(dx) = / t j ^2Ni(dt,dx) ) = / tiV(dt,dx), 



8=1 



where N(-) = Ym=i ^i(') * s a g am a Poisson random measure. Thus the Levy inten- 
sity for /i(-) is 



z/(dt, dx) = z^(dt, dx). 



(61) 



i=i 



Because Theorem 1 in [JLP09J applies for any CRMs with Levy measure i>(dt,dx), 
the Lemma is proved. 

Proof of Lemma 1111 This follows by merging the impact of the subsampling op- 
eration with the sampling step in Lemma [TJ Suppose the Levy measure is in the 
form Mp(dt\x)H(dx). The infinitesimal rate at data point Xi when sampling the 
jump is now q(xi)Mp(dt\x). Thus the Levy measure for the subsampled measure 
must be Mp(dt\x)q(x)H(dx). 

This argument can be seen from the detailed derivation below. First note that 
S q {p) is equivalent to 



S*(£) 



z(dx)sN(ds, dx). 



(62) 



R+xX 



Let B e X, we divide B into n non-overlap patches and use A nm to denote the m-th 
patch of them. So we have 

] 



n— >oo 








Ejv(-),2 












J_J_ A/V(-),z L e J 










e T.A nm eB^s{^N(.)4 e ~ Uz(Anm)SnmN{Anm ' 8nm 


(a) 


e T, Anrn eB E N(-), z [ e ' uz(Anm)s " mN{Anm,s " m) - 1 


(6) 




eB E iV(-) [e-**"™^^™' 8 ™™)-!] 


n— >oo 





(63) 

Here (a) above follows because Ejv(.) [(e~ uz( - Anm ' ,SnmN( - Anrn > Snm ^ — l)] is infinitesimal 

thus log(l + x) ~ x applies. (6) is obtained by integrating out z(A nm ) with 
Bernoulli distribution. Thus it can be seen from (163]) that S g (jl) has the Levy 
measure of q(dx)v(dt, dx). 
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Proof of Theorem [9] We show that starting from (|42|) and (14T]) . we can both end 
up the random measures defined in (|43j) . 

First, for the operations in ( H2"j) . adapting from Theorem 2.17 of CIO , a Poisson 



random measure with mean measure v on the space R + x X has the form 

oo 

N = E E «w ( 64 ) 

n=l i<if n 

where is a Poisson distributed random variable with mean z/, and (s G IR + , iGX) 
are points in the corresponding Poisson processes. Then a realization of iV composes 
of points in a Poisson process 111, and the corresponding Poisson random measure 
can be written as iVi = Y,( 8 ,x)&h 'W) - 

Now consider doing a subsampling S q and a point transition T on IT, by the 
definitions and f )64|) we get a new random measure 

N = T(S q (N 1 )) = T(S q (£s {s>x) )) 
(*) \ ^ (**) \ 

= 2^ z ( q ( T ( x >» S ('> T to) = l^ z ( q ( x )) S (s,T(x)), (65) 

where z(q(-)) means a Bernoulli random variable with acceptance rate q(-), (*) 
follows from definitions, (**) follows from the assumption of constant subsampling 
rate. 

It is easy to show by induction that by subsampling and point transitioning i 
times of the Poisson process IT, we get a random measure as 

N' = Y,*{<i{x))hs,T^)y (66) 

By the definition, when superpositioning the this Poisson process T" l (5f (IT)) 
with another Poisson process n 2 with mean measure v 2 -, we get another random 
measure as 

N"= ]T z{q\x))5 [s ^ {x)) + W))- (67) 

This Poisson random measure is then used to construct a completely random 
measure p, using (EJ) as: 



fl(A) = / sN"(ds,dx) 

JR+xX 

Y ^(^(^))^(s,T'(x)) + S6 (s,x)- (6^ 
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By marginalize over r's and normalizing this random measure, we get 

M[ E(,,x)gn in A sS (s,Ti(x)) 
Mi + M' 2 E( s ,x)Gn in x s5 (s,THx)) 
M' 2 E(s,x)gn 2 nA s5 (s,T'(x)) 
M \ + M 2 E(,,x) G n 2 nx sS (:T*(x)) 



+ 



7 (Tu 1 )(A)+ , 2 , (T l a 2 )(A), 



(69) 



where by apply Lemma [TT1 we conclude that M[ = (q l jli) (X) is the total mass of the 
random measure with Levy measure qi (dx)v(dt, dx) and M' 2 = /^(X). We use the 
fact that (T fe /ij)(X) = /tj(X) in the derivation of ( 16"9~|) . because the point transition 
operation only moves the points (s, x) of the Poisson process to other locations 
(s, x + dx), thus does not affect the total mass of the corresponding random measure. 

This means by superpositioning after subsampling, the mass of the normalized 
random measure decades exponentially fast with respect to the distance i. Based on 
Eq. f !69p . when taking % from 1 to n, and taking superposition for all these random 
measure induced, the resulting normalized random measure is: 

a4 = V vJ 9 T^ )(X L^ w 'V»)- (70) 
EJ =1 ^ti) (x) ^ 1 ; 



Next, for the operations in (JHJ), from the definition we have 

/4 = rosvo)©^ 



te)(x) -r(^) + 7 M!L M2 (71) 



(g/ii + /i 2 ) (X) (g/ii + /i 2 ) (X) 

Now /4 has a total mass of (qfii + /t 2 )(X), by induction on i, we get the formula in 
(13]) for i = n. 

This completes the proof. 

Proof of Theorem 1101 Given the current data configuration {n k , k = 1, 2, ■ ■ ■ }, 
for a particular k, 

• If n k > 0, this means this jump J k must exist in /i, otherwise it is impossible 
to have n k > 0, thus p(z k = n) = 1. 



Otherwise, since a = V, „ , ^ Jk&k 7 , we have the likelihood as: 

n 



T n k" n r" 

■J k ii Llk":n k „>0 J k' 



(E* 
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Furthermore, we know that the prior for zj. is p(zk = 1) = q, thus the posterior 



is: 



p(z k = l\jj>,ri) oc 



p(z k = 0\p,,n) oc 



(Efc^fc Zk ' Jk ' + Jk "> n 

l-q 



After normalizing, we get the posterior for the case n k = in ( 1431) . 

Proof of Corollary [3] Note that J^ fc is obtained by subsampling of {J m /fc,m' < 
m}, the number of data points in p! m is denoted as h' m . = J2 k , h' mk ,. 

Following the same arguments as in the proof of Theorem [10J when n' mk > 0, 
v( z mk = l\fim,n' m -) = 1- Otherwise, by subsampling, /j,' m can be written as: 

/ \ ^ \ ^ z m'k' Jm'k'$O m i k i 

A*m — / j / j ~ j • 

in' <m k':z m / k /=l ft 

Now following the same proof of Theorem [TD], if we define 

•^m = ( ^ ^ ^ z m'k' J'm'k' J j ^ m | ^ ^ ^ ^ ^m'k'Jm'k' 

\m'<m k' / \m'<mk'y^k 



then we get the likelihood as 



nV n k" 
k":h' ,„>0 J mk" 

mfc 

Jin, 



Furthermore, from subsampling, we know that the Bernoulli prior for z m k is q m ~ m ; 
and the posterior can then be derived using the Bayes rule as in the proof of Theo- 
rem [TUJ 



